feat(go): add go desrialization support via io streams by ayush00git · Pull Request #3374 · apache/fory

ayush00git · 2026-02-20T04:48:00Z

Why?

To enable stream-based deserialization in Fory's Go library, allowing for direct reading from io.Reader without pre-buffering the entire payload. This improves efficiency for network and file-based transport and brings the Go implementation into feature-parity with the python and C++ libraries.

What does this PR do?

1. Stream Infrastructure in `go/fory/buffer.go`

Enhanced ByteBuffer to support io.Reader with an internal sliding window and automatic filling.

Added reader io.Reader and minCap int fields.
Implemented fill(n int, err *Error) bool for on-demand data fetching and buffer compaction.
Added CheckReadable(n) and Skip(n) memory-safe routines that pull from the underlying stream when necessary to avoid out-of-bounds panics.
Updated ReadBinary and ReadBytes to safely copy slices when streaming to prevent silent data corruption on compaction.
Updated all Read* methods (fixed-size, varint, tagged) to fetch data from the reader safely if not cached.

2. Stateful InputStream in `go/fory/stream.go`

Added the InputStream feature to support true, stateful sequential stream reads.

Introduced InputStream which persists the buffered byte window and TypeResolver metadata (Meta Sharing) across multiple object decodes on the same stream, decoupled from Fory to mirror the C++ ForyInputStream implementation.
Added fory.DeserializeFromStream(is, target) method to process continuous streamed data.
Added Shrink() method to compact the internal buffer and reclaim memory during long-lived streams.
Added DeserializeFromReader method as an API for simple one-off stream object reads.

3. Stream-Safe Deserialization Paths

Updated internal deserialization pipelines in struct.go and type_def.go to be stream-safe:

Integrated CheckReadable bounds-checking into the struct.go fast paths for fixed-size primitives.
Safely rewrote schema-evolution skips (skipTypeDef) in type_def.go to use bounds-checked Skip() rather than unbounded readerIndex overrides.

4. Comprehensive Stream Tests

Built a custom oneByteReader wrapper (go/fory/test_helper_test.go) that artificially feeds the deserialization engine exactly 1 byte at a time.
Migrated the global test suite (struct_test.go, primitive_test.go, slice_primitive_test.go, etc.) to run all standard tests through this aggressive 1-byte fragmented stream reader via a new testDeserialize helper to guarantee total stream robustness.

Related issues

Closes #3302

Does this PR introduce any user-facing change?

Does this PR introduce any public API change? (NewInputStream, DeserializeFromStream, DeserializeFromReader, NewByteBufferFromReader)
Does this PR introduce any binary protocol compatibility change?

Benchmark

Main branch -

This branch -

ayush00git · 2026-02-21T04:39:58Z

Hey @chaokunyang
Have a review and let me know the changes

Zakir032002 · 2026-02-23T07:05:46Z

hey @ayush00git, looked through this and the main issue i see is in DeserializeFromReader —
it calls ResetWithReader at the start of every call:

func (f *Fory) DeserializeFromReader(r io.Reader, v any) error {
    defer f.resetReadState()
    f.readCtx.buffer.ResetWithReader(r, 0) // this wipes the prefetch window every time

so if fill() reads ahead past the first object boundary (which it will), those bytes
are gone on the next call. sequential decode from one stream is broken:

for {
    var msg Msg
    f.DeserializeFromReader(conn, &msg) // bytes after first object get thrown away
}

if you look at how he handles this for c++/python — the Buffer is constructed
from the stream once and passed to each deserialize call directly. the buffer holds
state across calls, it's never reset between objects. the python test
test_stream_deserialize_multiple_objects_from_single_stream shows this exactly —
same reader buffer passed to multiple fory.deserialize() calls.

the go version probably needs something similar — a stream reader type that owns the
buffer and gets reused across deserializations rather than resetting on each call.

Happy to discuss if I'm misreading the flow here

ayush00git · 2026-02-23T09:12:08Z

Hiii @Zakir032002
Thanks for noticing this, exactly this is a bug in the implementation from my side. yes the call would clear any prefetched data from the ByteBuffer making the sequential reads from the stream impossible, also it was clearing the typemetadata as well. thanks for mentioning this, i'll look at the c++ python implementation to correct the deserializer.

Zakir032002 · 2026-02-23T10:04:29Z

hey @ayush00git , one more thing — ReadBinary and ReadBytes return a direct slice into
b.data:

v := b.data[b.readerIndex : b.readerIndex+length]
return v

the problem is fill() compacts the buffer in-place:

copy(b.data, b.data[b.readerIndex:])

so if someone reads a []byte field and holds onto that slice, then the next
read triggers a fill() — the compaction just overwrote the bytes they're
still holding. no error, no panic, just wrong data.

in stream mode you probably want to copy before returning instead of aliasing:

if b.reader != nil {
    result := make([]byte, length)
    copy(result, b.data[b.readerIndex:b.readerIndex+length])
    b.readerIndex += length
    return result
}

in-memory path stays as is.

Zakir032002 · 2026-02-23T10:07:58Z

also noticed — ReadVarUint32Small7 only does fill(1) for the first byte, but if that byte has 0x80 set it falls through to continueReadVarUint32 which isn't touched in this PR. so in stream mode, if a multi-byte varint straddles a chunk boundary, the continuation bytes may not be in the buffer yet — you either get a BufferOutOfBoundError or silently read the wrong bytes depending on what's sitting at that position in the buffer.

easiest fix is probably just routing the multi-byte case through readVarUint32Slow since that's already stream-aware after your changes. or adding fill(1) guards inside continueReadVarUint32 directly, either works.

Happy to discuss if I'm misreading the flow here

ayush00git · 2026-02-23T18:12:26Z

Hey @Zakir032002
Sorry i'm a bit busy with my exams, as i get free, i'll review the comments

docs/compiler/compiler-guide.md

ayush00git · 2026-02-25T19:34:28Z

Hii @Zakir032002
Thanks for pointing out the flows,

The ReadFromDeserializer and returning a direct slice into the data stream are wrongly implemented by me, thanks for suggesting the chnages to fix them as well.

But i think you misunderstood the ReadVarUint32Small7. We already have a check condition -

if len(b.data)-readIdx >= 5 {

}

If we are near a chunk boundary (less than 5 bytes remaining in the buffer), the execution completely skips continueReadVarUint32 and jumps straight to readVaruint36Slow. I don't think this part need any changes

…le stateful deserialization

ayush00git · 2026-02-26T12:02:14Z

I've added the StreamReader which now creates a copy slice during desrialization to preserve the data between sequential desrialization calls. DeserializeFromReader only is there if the user wants to deserialize a single struct and don't want a stream overhead for that.

go/fory/fory.go

ayush00git · 2026-03-03T09:55:30Z

@chaokunyang
the api design now matches with c++, is there any other modification ?

chaokunyang · 2026-03-04T09:40:16Z

I added shrink_input_stream in #3453, could you add similiar feature in this PR? And I renamed StreamReader to InputStream in #3449, please also rename related API too. the StreamReader is not that clear

…nk api

chaokunyang · 2026-03-05T14:10:50Z

go/fory/buffer.go

+		copy(b.data, b.data[b.readerIndex:])
+		b.writerIndex -= b.readerIndex
+		b.readerIndex = 0
+		b.data = b.data[:b.writerIndex]


Why need to process on writerIndex?

we're using a sliding window approach to push the unread bytes to the front, so if we only put readerIndex = 0 and keep the writerIndex at its old position, it would track garbage value left in between.

chaokunyang · 2026-03-05T14:16:15Z

go/fory/buffer.go

 	if b.readerIndex+8 > len(b.data) {
-		*err = BufferOutOfBoundError(b.readerIndex, 8, len(b.data))
-		return 0
+		if !b.fill(8, nil) {


Why you pass nil, this will just slicence the error

my bad. i'll fix it up

chaokunyang · 2026-03-05T14:19:30Z

go/fory/buffer.go

 //go:inline
 func (b *ByteBuffer) ReadVaruint36Small(err *Error) uint64 {
-	if b.remaining() >= 8 {
+	if b.remaining() >= 8 || (b.reader != nil && b.fill(8, nil)) {


If it's a network stream ,and write don't close teh stream, then the stream EOF never come, this will just hang forever

yes we should never hang it on a pre-fill 8 bytes. let me fix it.

chaokunyang · 2026-03-05T14:20:44Z

Some Design suggestions:

Keep fast-path varint checks purely remaining >= N; do not call fill(N) in the predicate.
Never pass nil error sinks to fill in decode methods that promise bounds errors.
Make DeserializeFromReader either strictly stateless (always reset buffer) or explicitly stateful and align it with InputStream semantics.
Remove InputStream.reader unless it is actually needed (or use it for invariants/validation).

chaokunyang · 2026-03-05T14:21:33Z

And please run benchmarks/go for current branch and pache/main branch, and paste the result here

ayush00git · 2026-03-05T16:35:22Z

Benchmark: apache/fory/main vs this branch (feat/go-deserialization)

apache/fory/main:

goos: linux
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
BenchmarkFory_Struct_Serialize-12              34537162     110.4 ns/op      0 B/op    0 allocs/op
BenchmarkFory_Struct_Deserialize-12            19927696     179.6 ns/op     32 B/op    1 allocs/op
BenchmarkFory_StructList_Serialize-12           3048019    1176 ns/op        0 B/op    0 allocs/op
BenchmarkFory_StructList_Deserialize-12         1583568    2456 ns/op      688 B/op    3 allocs/op
BenchmarkFory_Sample_Serialize-12              14559171     247.5 ns/op      0 B/op    0 allocs/op
BenchmarkFory_Sample_Deserialize-12             3322015    1158 ns/op      676 B/op    9 allocs/op
BenchmarkFory_SampleList_Serialize-12            919764    3815 ns/op        0 B/op    0 allocs/op
BenchmarkFory_SampleList_Deserialize-12          240294   20650 ns/op    13952 B/op  163 allocs/op
BenchmarkFory_MediaContent_Serialize-12         7116196     458.0 ns/op      0 B/op    0 allocs/op
BenchmarkFory_MediaContent_Deserialize-12       2302726    1496 ns/op      656 B/op   13 allocs/op
BenchmarkFory_MediaContentList_Serialize-12      532838    7471 ns/op        0 B/op    0 allocs/op
BenchmarkFory_MediaContentList_Deserialize-12    143080   30065 ns/op    13040 B/op  243 allocs/op

this branch:

goos: linux
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
BenchmarkFory_Struct_Serialize-12              32973771     121.9 ns/op      0 B/op    0 allocs/op
BenchmarkFory_Struct_Deserialize-12            27100645     163.6 ns/op     32 B/op    1 allocs/op
BenchmarkFory_StructList_Serialize-12           3265543    1103 ns/op        0 B/op    0 allocs/op
BenchmarkFory_StructList_Deserialize-12         1268991    2679 ns/op      688 B/op    3 allocs/op
BenchmarkFory_Sample_Serialize-12              12737720     258.3 ns/op      0 B/op    0 allocs/op
BenchmarkFory_Sample_Deserialize-12             3287950    1123 ns/op      676 B/op    9 allocs/op
BenchmarkFory_SampleList_Serialize-12            995695    3745 ns/op        0 B/op    0 allocs/op
BenchmarkFory_SampleList_Deserialize-12          231688   18862 ns/op    13952 B/op  163 allocs/op
BenchmarkFory_MediaContent_Serialize-12         6446667     542.3 ns/op      0 B/op    0 allocs/op
BenchmarkFory_MediaContent_Deserialize-12       2616025    1222 ns/op      656 B/op   13 allocs/op
BenchmarkFory_MediaContentList_Serialize-12      414025    8415 ns/op        0 B/op    0 allocs/op
BenchmarkFory_MediaContentList_Deserialize-12    172894   28446 ns/op    13040 B/op  243 allocs/op

ayush00git · 2026-03-05T16:38:06Z

@chaokunyang
I'm done with the changes. pasted the benchmarks as well. have a look now

chaokunyang · 2026-03-06T10:53:55Z

go/fory/test_helper_test.go

+
+	// Create a new stream reader. The stream context handles boundaries and compactions.
+	streamReader := NewInputStream(stream)
+	err = f.DeserializeFromStream(streamReader, v)


This helper deserializes into the same target twice (first from bytes, then from stream). That can mask stream-path bugs because values populated by the first pass may still be present if the second pass does not fully overwrite fields.

Please deserialize the stream path into a fresh value and compare results, so partial/incorrect stream decoding is detectable.

chaokunyang · 2026-03-06T10:54:02Z

go/fory/stream_test.go

+	reader := &slowReader{data: data}
+	var decoded StreamTestStruct
+	// Use small minCap (16) to force frequent fills and compactions
+	f.readCtx.buffer.ResetWithReader(reader, 16)


This setup line does not affect the code path under test, because DeserializeFromReader immediately calls ResetWithReader(r, 0) internally and replaces the min-cap you set here.

If the goal is to verify small-cap refill/compaction behavior, consider using DeserializeFromStream with NewInputStreamWithMinCap(..., 16) so the configured capacity is actually used.

chaokunyang · 2026-03-06T13:43:52Z

go/fory/stream.go

+}
+
+// NewInputStreamWithMinCap creates a new InputStream with a specified minimum buffer capacity.
+func NewInputStreamWithMinCap(r io.Reader, minCap int) *InputStream {


THis is buffer size, not capacity, pelase change API name

i'll change it to NewInputStreamWithBufferSize. does that sounds good ? It would also resonate with the api implemented in the cpp strem deserialization

chaokunyang

LGTM

added io.Reader to ByteBuffer for streaming deserialization

ba26d6a

ayush00git requested a review from chaokunyang as a code owner February 20, 2026 04:48

ayush00git changed the title ~~feat(go): add go desrialization support via transport streams~~ feat(go): add go desrialization support via io streams Feb 20, 2026

ayush00git added 5 commits February 20, 2026 14:01

added NewByteBufferFromReader and fill method

8ba4dba

added condition to check for read stream and ResetByteBuffer method

e4baf32

added stream deserializer and initialized buffer reader to 0

f751916

added stream test suites

88ecb2d

fix ci

a768ac1

Merge branch 'main' into feat/go-deserialization

4ffe66f

fix(docs): updated compiler guide

1742a5d

ayush00git requested review from PragmaTwice and theweipeng as code owners February 25, 2026 15:46

chaokunyang reviewed Feb 25, 2026

View reviewed changes

docs/compiler/compiler-guide.md Outdated Show resolved Hide resolved

ayush00git added 2 commits February 25, 2026 21:42

Merge branch 'main' into feat/go-deserialization

4887f5b

Merge branch 'main' into feat/go-deserialization

544645b

ayush00git added 6 commits February 26, 2026 16:52

Merge branch 'main' into feat/go-deserialization

1be86de

fix: create a copy of slice while desrlz to prevent overwriting

e205d5d

added StreamReader struct and NewStreamReader method which would hand…

4fdb8ec

…le stateful deserialization

added StreamReader struct and NewStreamReader method which would hand…

1a8c100

…le stateful deserialization

added tests for stream reader

92c207d

code lint checks

d90c55d

ayush00git requested a review from chaokunyang February 27, 2026 05:17

chaokunyang reviewed Mar 3, 2026

View reviewed changes

go/fory/fory.go Outdated Show resolved Hide resolved

chaokunyang reviewed Mar 3, 2026

View reviewed changes

go/fory/fory.go Outdated Show resolved Hide resolved

ayush00git added 3 commits March 3, 2026 14:57

moved StreamReader to a new file

a78197f

refined apis

91d04f4

updated test suites

8935868

ayush00git requested a review from chaokunyang March 3, 2026 17:00

Merge branch 'main' into feat/go-deserialization

e660756

ayush00git added 4 commits March 4, 2026 16:10

renamed StreamReader api to InputStream and implemented a buffer shri…

d504099

…nk api

replaced the occurence with new api methods

53e4f3d

added unit test for shrink buffer api

0300946

fixed indentation

9f92968

chaokunyang reviewed Mar 5, 2026

View reviewed changes

ayush00git added 2 commits March 5, 2026 21:29

Merge branch 'main' into feat/go-deserialization

db6dca0

correct err checking and removed reader from InputStream

722e252

chaokunyang reviewed Mar 6, 2026

View reviewed changes

ayush00git added 2 commits March 6, 2026 16:29

Merge branch 'main' into feat/go-deserialization

9344068

using correct apis in tests

09d0424

chaokunyang reviewed Mar 6, 2026

View reviewed changes

renamed API

84b5b82

chaokunyang approved these changes Mar 6, 2026

View reviewed changes

chaokunyang merged commit af6e8b2 into apache:main Mar 6, 2026
58 checks passed

Conversation

ayush00git commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

1. Stream Infrastructure in go/fory/buffer.go

2. Stateful InputStream in go/fory/stream.go

3. Stream-Safe Deserialization Paths

4. Comprehensive Stream Tests

Related issues

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

ayush00git commented Feb 21, 2026

Uh oh!

Zakir032002 commented Feb 23, 2026

Uh oh!

ayush00git commented Feb 23, 2026

Uh oh!

Zakir032002 commented Feb 23, 2026

Uh oh!

Zakir032002 commented Feb 23, 2026

Uh oh!

ayush00git commented Feb 23, 2026

Uh oh!

Uh oh!

ayush00git commented Feb 25, 2026

Uh oh!

ayush00git commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

ayush00git commented Mar 3, 2026

Uh oh!

chaokunyang commented Mar 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chaokunyang commented Mar 5, 2026

Uh oh!

chaokunyang commented Mar 5, 2026

Uh oh!

ayush00git commented Mar 5, 2026

Uh oh!

ayush00git commented Mar 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chaokunyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ayush00git commented Feb 20, 2026 •

edited

Loading

1. Stream Infrastructure in `go/fory/buffer.go`

2. Stateful InputStream in `go/fory/stream.go`